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A topological multiple testing scheme for one-dimensional do- 
mains is proposed where, rather than testing every spatial or tempo- 
ral location for the presence of a signal, tests are performed only at 
pH ' the local maxima of the smoothed observed sequence. Assuming uni- 

^^ ' modal true peaks with finite support and Gaussian stationary ergodic 

noise, it is shown that the algorithm with Bonferroni or Benjamini- 

Hochberg correction provides asymptotic strong control of the family 

C^ ' wise error rate and false discovery rate, and is power consistent, as the 

search space and the signal strength get large, where the search space 
may grow exponentially faster than the signal strength. Simulations 
show that error levels are maintained for nonasymptotic conditions, 
and that power is maximized when the smoothing kernel is close in 
shape and bandwidth to the signal peaks, akin to the matched fil- 
ter theorem in signal processing. The methods are illustrated in an 
analysis of electrical recordings of neuronal cell activity. 
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Cf^ I 1. Introduction. One of the most challenging aspects of multiple testing 

^^ ' problems in spatial and temporal domains is how to account for the spatial 

j_J . or temporal structure in the underlying signal. The usual paradigm considers 
a separate test at each observed location. However, the interest is usually in 

.^_, detecting signal regions that span several neighboring locations. This paper 

^ ! considers a new multiple testing paradigm for spatial and temporal domains 

H ' where tests are not performed at every observed location, but only at the 
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local maxima of the observed data, seen as representatives of underlying 
signal peak regions. The proposed inference is not pointwise but topological, 
based on the observed local maxima as topological features. 

In pointwise testing, the control of family-wise error rate (FWER), now 
common in neuroimaging, was established by Keith Worsley [Taylor and 
Worsley (2007), Worsley et al. (1996b, 2004)], who exploited the Euler 
characteristic heuristic for approximating the distribution of the maximum 
of a random field [Adler and Taylor (2007), Adler, Taylor and Worsley 
(2010)]. Methods for controlling the false discovery rate (FDR) [Benjamini 
and Hochberg (1995)] are also applied routinely in this setting, but the spa- 
tial structure is difficult to incorporate and often ignored [Genovese, Lazar 
and Nichols (2002), Nichols and Hayasaka (2003), Schwartzman, Dougherty 
and Taylor (2008)]. 

Despite pointwise testing being so common, the real interest is usually not 
in detecting individual locations, but connected regions or clusters. This has 
prompted the adaptation of discrete FDR methods to pre-defined clusters 
[Benjamini and Heller (2007), Heller et al. (2006)], and the use of Gaussian 
random field theory for computing p- values corresponding to the height, ex- 
tent and mass of clusters obtained by pre-thresholding the observed random 
field [Poline et al. (1997), Zhang, Nichols and Johnson (2009)]. Perone Paci- 
fico et al. (2004, 2007) proposed data-dependent thresholds so that FDR is 
controlled at the cluster level, using Gaussian random field theory to ap- 
proximate the null distribution. However, the definition of Type I error for 
clusters requires a tolerance parameter for the overlap between a discov- 
ered cluster and the null region [Perone Pacifico et al. (2004)], while spatial 
smoothing, which is often applied for improving signal-to-noise ratio (SNR), 
creates the need to remove the spread of the signal over the null region to 
avoid error inflation [Perone Pacifico et al. (2007)]. Chumbley and Friston 
(2009) have argued that current cluster methods are unsatisfactory because, 
just like marginal FDR procedures, they rely on the basic premise of having 
a test at each spatial location; instead, inference should be topological. 

This article proposes a different multiple testing paradigm where tests 
are performed, not at each spatial or temporal location, but only at the 
local maxima of the smoothed data, seen as topological representatives of 
their neighborhood region or cluster. A similar idea was recently proposed 
independently by Chumbley et al. (2010), but they did not consider whether 
Type I error could be controlled. Here we extend the classical control of 
FWER via the global maximum to control of both FWER and FDR via 
local maxima. Because the distributional theory for local maxima of random 
fields is more difficult than that for global maxima, this paper only considers 
one-dimensional domains (spatial or temporal), where closed- form solutions 
exist, leaving the two- and three-dimensional cases for future work. 

Our general proposed algorithm consists of the following steps: 
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Fig. 1. Simulated observed sequence y{t) (green) and smoothed sequence y-y^t) (blue) 
over five underlying true peaks of different shapes comprising fj,(t) (red). Out of 33 local 
maxima of y~,{t) (yellow), the BH detection threshold at FDR level 0.2 (dashed magenta) 
selects five, one of which is a false positive. At this noise level, four out of five true peaks 
are detected. Note that this bandwidth is able to distinguish the overlapping peaks. 



(1) Kernel smoothing: to increase SNR [Smith and Nichols (2009), Wors- 
ley et al. (1996a)]. 

(2) Candidate peaks: find local maxima of the smoothed sequence. 

(3) p-values: computed at each local maximum under the complete null 
hypothesis of no signal anywhere. 

(4) Multiple testing: apply a multiple testing procedure and declare as 
detected peaks those local maxima whose p-values are significant. 

In this paper, the p-values in step (3) are computed using theory of Gaus- 
sian processes. For step (4), we consider two standard multiple testing pro- 
cedures: Bonferroni to control FWER and Benjamini-Hochberg (BH) [Ben- 
jamini and Hochberg (1995)] to control FDR. The algorithm is illustrated 
by a simulated example in Figure 1. 

We study the theoretical properties of the above algorithm under a specific 
signal-plus-noise model and then relax these assumptions in the simulations. 
For Type I errors to be well defined, the signal is modeled as if composed 
of unimodal peak regions, each considered detected if a significant local 
maximum occurs inside its finite support. For simplicity, we concentrate on 
positive signals and one-sided tests, but this is not crucial. For tractability, 
the theory assumes that the observation noise follows a smooth stationary 
ergodic Gaussian process. This assumption permits an explicit formula for 
computing the p-values corresponding to local maxima of the observed pro- 
cess. The distribution of the height of a local maximum of a Gaussian process 
is not Gaussian but has a heavier tail, and its computation requires careful 
conditioning based on the calculus of Palm probabilities [Adler, Taylor and 
Worsley (2010), Cramer and Leadbetter (1967)]. 

An interesting and challenging aspect of inference for local maxima is the 
fact that the number of tests, equal to the number of observed local maxima, 
is random. The multiple testing literature usually assumes the number of 
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tests to be fixed. We overcome this difficulty with an asymptotic argument 
for large search space, so that by ergodicity, the error behaves approximately 
as it would if the number of tests were equal to its expected value. 

In order to achieve strong control of FWER and FDR, the asymptotics 
for large search space are combined with asymptotics for strong signal. The 
strong signal assumption asymptotically eliminates the false positives caused 
by the smoothed signal spreading into the null regions, by assuring that 
each signal peak region is represented by only one observed local maximum 
within the true domain with probability tending to one. The strong signal 
assumption is not restrictive in the sense that the search space may grow 
exponentially faster. Simulations show that error levels are maintained at 
finite search spaces and moderate signal strength. 

Defining detection power as the expected fraction of true peaks detected, 
we prove that the algorithm is consistent in the sense that its power tends 
to one under the above asymptotic conditions. We find that the optimal 
smoothing kernel is approximately that which is closest in shape and band- 
width to the signal peaks to be detected, akin to the so-called matched filter 
theorem in signal processing [Pratt (1991), Simon (1995)]. This optimal 
bandwidth is much larger than the usual optimal bandwidth for nonpara- 
metric regression. 

In one dimension, the problem of identifying significant local maxima is 
similar to that of peak detection in signal processing [e.g., Arzeno, Deng 
and Poon (2008), Baccus and Meister (2002), Brutti et al. (2005), Harezlak 
et al. (2008), Morris et al. (2006), Yasui et al. (2003)]. In this literature, 
though large, the detection threshold is predominantly chosen heuristically 
and conservatively. Our multiple testing viewpoint provides a formal mech- 
anism for choosing the detection threshold, allowing detection under higher 
noise conditions. This view also eliminates the need to estimate an unknown 
number of peak location parameters, encountered in the signal estimation 
approach [Li and Speed (2000, 2004), O'Brien, Sinclair and Kramer (1994), 
Tibshiraniet al. (2005)]. 

We illustrate our procedure with a data set of neural electrical recordings, 
where the objective is to detect action potentials representing cell activity 
[Baccus and Meister (2002), Segev et al. (2004)]. The noise parameters and 
signal peak shape are estimated from a training set and then applied to 
a test set for peak detection. 

The data analysis and all simulations were implemented in Matlab. 

2. Theory. 

2.1. The model. Consider the signal-plus-noise model 
(1) y(t)=//(t) + z(t), tGR, 
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where the signal ii{t) is a train of unimodal positive peaks of the form 

oo 

(2) //(t)= Y^ Ujhjit), aj>0, 

j=-oo 

and the peak shape hj{t) > has compact connected support Sj = {t : hj{t) > 
0} and unit action Jg, hj{t) dt = l for each j. Let w^{t) > with bandwidth 

parameter 7 > be a unimodal kernel with compact connected support and 
unit action. Convolving the process (1) with the kernel w.y{t) results in the 
smoothed process 



(3) y^{t) = w^{t) * y{t) = Wj{t- s)y{s)ds = fi^{t) + z^{t), 

J —00 

where the smoothed signal and smoothed noise are defined as 

00 

(4) fi^{t) = Wy{t) * fi{t) = y^ ajhj^^{t), Zy{t) = Wj{t) * z{t). 

j=-oo 

For each j, the smoothed peak shape hj^^{t) = Wry(t) * hj(t) > is uni- 
modal and has compact connected support Sj^^ and unit action. For each j, 
we require that hj^^{t) is twice differentiable in the interior of Sj^-y and has 
no other critical points within its support. For simplicity, the theory requires 
that the supports Sj^^ do not overlap (but this is not required in practice, 
as shown via simulations in Section 3). The smoothed noise z^{t) defined 
by (3) and (4) is assumed to be a zero-mean thrice differentiable stationary 
ergodic Gaussian process. 

2.2. The STEM algorithm. Suppose we observe y{t) defined by (1) in 
the segment [— L/2,L/2], which contains J peaks. We call the following 
procedure STEM (Smoothing and TEsting of Maxima). 

Algorithm 1 (STEM algorithm). 

(1) Kernel smoothing: construct the process (3), ignoring the boundary 
effects at ibL/2. 

(2) Candidate peaks: find the set of local maxima of y'y(i) in [— L/2,L/2] 



(5) f=lte 



L L 
2"' 2" 



:i,(„ = Mfl = o,S,(«) = i^<0 



(3) p-values: for each t £ T compute the p- value p-y{t) for testing the 
(conditional) hypothesis 

no{t):n{t)=0 vs. nA{t):fi{t)>0, tet. 

(4) Multiple testing: let m be the number of tested hypotheses, equal to 
the number of local maxima in T. Apply a multiple testing procedure on 
the set of m p-values {p^ (t) , t E T} , and declare significant all peaks whose 
p-values are smaller than the significance threshold. 
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Steps (1) and (2) above are well defined under the model assumptions 
(for data on a grid, local maxima are defined as points higher than their 
neighbors). Step (3) is detailed in Section 2.3 below. For step (4), we use the 
Bonferroni procedure to control FWER and the BH procedure to control 
FDR. To apply Bonferroni at level a, declare significant all peaks whose 
p- values are smaller than ajrh. To apply BH at level a, find the largest 
index k for which the ith smallest p- value is smaller than ia/rh, and declare 
as significant the k peaks with smallest p- values. Notice that, in contrast to 
the usual application of the Bonferroni and BH procedures, the number of 
tests m is random. 

2.3. p-values. Given the observed heights y-yit) at the local maxima 
t €T, the p-values in step (3) of Algorithm 1 are computed as 

(6) p^{t) = F^[y^it)], tGT, 
where 

(7) F^{u) = P{z^{t)>u\t€f} 

denotes the right cumulative distribution function (cdf) of z^{t) at the local 
maxima t £T, evaluated under the complete null hypothesis ^(t) = 0, Vt. 

The conditional distribution (7) is called a Palm distribution [Adler, Tay- 
lor and Worsley (2010), Chapter 6]. Unlike the marginal distribution of z^{t), 
it is not Gaussian but stochastically greater. This is because the point of 
evaluation t G T is not a fixed point t G M, but the random location of a local 
maximum of z^(t). Moreover, the conditioning event has probability zero. 
The Palm distribution (7) has a closed-form expression, originally obtained 
by Cramer and Leadbetter [(1967), Chapter 11] (equation 11.6.14), using 
the well-known Kac-Rice formula [Rice (1945), Adler and Taylor (2007), 
Chapter 11]. A direct application, borrowing notation from those sources, 
gives the following. 

Proposition 2. Suppose the assumptions of Section 2.1 hold and that 
fj,{t) =0,Vi. Define the moments 

(8) a^ = var[z^{t)], A2,7 = var[i^(t)], X4^^ = vaT[z^{t)]. 
Then the distribution (7) is given by 




where A = a^X4^j — ^2^ , and (/)(x), <^(x) are the standard normal density 
and cdf, respectively. 

The quantities a^, A2,7 and X^^y in Proposition 2 depend on the ker- 
nel Wy{t) and the autocorrelation function of the original noise process z(t). 
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Explicit expressions may be obtained, for instance, for the following Gaus- 
sian autocorrelation model, which we use later in the simulations. 

Example 3 (Gaussian autocorrelation model). Let the noise z{t) in (1) 
be constructed as 

/■oo 1 /t- s\ 
z{t)=a -(pi jdB{s), o-,i/>0, 

where B{s) is standard Brownian motion and z^ > 0. Convolving with a Gaus- 
sian kernel w^{t) = {l/j)<j){t/'y) with 7 > as in (4) produces a zero-mean 
infinitely differentiable stationary ergodic Gaussian process 



00 



1 . ft- s 



z^{t) = w^{t)*z{t) = aj -q^l-^\dB{s), C = ^7^ + ^^ 

with moments (8) given by a^ = a^/{2^/^^^), \2,y = a"^ /{4:^/^^^^), X^^^ = 
3a'^/{8^/^^S,^). The above expressions may be used as approximations if the 
kernel, required to have finite support, is truncated at t = ±76? for moder- 
ately large d, say d = 3. 

2.4. Error definitions. Because truly detected peaks may be shifted with 
respect to the true peaks as a result of noise, we define a significant local 
maximum to be a true positive if it falls anywhere inside the support of 
a true peak. Conversely, we define it to be a false positive if it falls outside 
the support of any true peak. Assuming the model of Section 2.1, define the 
signal region Si and null region Sq) respectively, by 

J ILL 

(10) §1 = U Sj and §0 = 



i=i 



2' 2 




For a significance threshold u, the total number of detected peaks and the 
number of falsely detected peaks are 

R{u) = #{t€f:y^{t)>u} and V{u) = #{t €f nE,o:y^{t) > u}, 

respectively. Both are defined as zero if T is empty. The FWER is defined 
as the probability of obtaining at least one falsely detected peak 

(11) FWER(n) = P{y(n)>l} = p|rnSo/0 and max y^{t) > u} . 

The FDR is defined as the expected proportion of falsely detected peaks 

V{u) 



(12) FDR(u) = E 



R{u) V 1 
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Original signal hj{t) 



Smoothed signal hj^-y[t) 



— ^i^h -^ HM^^^ 

Fig. 2. Schematic signal and null regions, before and after smoothing, in the vicinity of 
one signal peak. 

Note that the above definitions are with respect to the original signal 
support §1, while the inference is carried out using the smoothed observed 
process y-yit). Kernel smoothing enlarges the signal support and increases 
the probability of obtaining false positives in the null regions neighboring 
the signal [Perone Pacifico et al. (2007)]. In contrast to (10), the smoothed 
signal region Si^^ D Si and smoothed null region §0,7 C §0 are 



L L 
2"' 2" 




(13) §i^-y = M Sj^^ and §0,7 = 

respectively (Figure 2). We call the difference between the expanded signal 
support and the true signal support the transition region 

J 

(14) T^ = Si,^\Si=So\So,7=U^:'-.'>' 

where Tj^^ = Sj^^ \ Sj is the transition region corresponding to each peak j. 
In general, a true peak may produce more than one significant local max- 
imum, affecting the interpretation of definition (12) and the nonasymptotic 
validity of the FDR controlling procedure. However, as explained below, this 
multiplicity is unlikely to occur for strong signals, assuring validity at least 
asymptotically under that regime. The simulations of Section 3.1 show it 
not to be problematic in nonasymptotic situations for moderate signals and 
appropriate smoothing. 

2.5. Strong control of FWER. In Algorithm 1, step (3) produces a list 
of fh p- values. If the Bonferroni correction is applied in step (4) with level 
a S (0, 1), then the null hypothesis Tioit) at t G T is rejected if 

(15) p-f{t)<^ ^^ yj{t) > UBon = f::^ ( ^ 

m ' \m 

where a/rfi is defined as 1 if rfi = 0. Recall that, in contrast to the usual 
Bonferroni algorithm, the number of p- values m is random. 
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Define the conditions: 

(CI) The assumptions of Section 2.1 liold. 

(C2) L — 7- oo and a = infj a^ — )■ oo, such that (log L)/a'^ — )■ and J/L — )■ Ai 
with 0<^i < 1. 

Theorem 4. Suppose that Algorithm 1 is applied with the Bonferroni 
threshold ttBon (15). Then, under conditions (CI) and (C2), 

limsupFWER(nBon) < «■ 

The proof of Theorem 4 is given in Section 6.2. The large search space 
assumption in (C2) solves the problem of rh being random, implying that 
by the weak law of large numbers, the ratio rh/L is close to its expec- 
tation E[m/L] for large L. Thus the Bonferroni procedure with random 
threshold (15) has asymptotically the same error control properties as if the 
threshold were deterministic and equal to 

(16) nBo. = F^~^(^)^F-^^ "/^ 



^ \E[m]J T V^i+E[mo,7(0,l)] 
where 



(17) E[mo,7(0,l)] 



1 /A4^ 



is the expected number of local maxima of z^{t) in the unit interval (0, 1) 
[Cramer and Leadbetter (1967), Chapter 10]. 

The strong signal assumption in (C2) implies (Lemma 10 in Section 6.1) 
that, with probability tending to 1, no local maxima are obtained in the 
transition region T^ (14), and exactly one local maxima is obtained for each 
signal peak in §i. This avoids the error inflation due to smoothing and 
provides the approximation in (16). The proof of Lemma 10 shows that the 
asymptotic rates are exponential and controlled partially by the smallest 
absolute derivative of the smoothed peak shape in the transition region and 
the curvature of the smoothed peak shape at the mode. 

2.6. Control of FDR. Suppose the BH procedure is applied in step (4) 
of Algorithm 1. For a fixed a S (0,1), let k be the largest index for which 
the ith smallest p- value is less than ia/rh. Then the null hypothesis T-Loi^t) 
at t E T is rejected if 

, . ka , . ^ -[ / ka\ 

18 Pj{t)<— ^^ y.,{t)>UBn = F-^ — ), 

m ' \m J 

where ka/rh is defined as 1 if m = 0. 
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Theorem 5. Suppose that Algorithm 1 is applied with the BH thresh- 
old ubh (18). Then, under conditions (CI) and (C2), 

limsupFDR(nBH) < a- 

The proof of Theorem 5 is given in Section 6.3. The asymptotic assump- 
tions (C2), imply that the BH procedure with random threshold (18) has 
asymptotically the same error control properties as if the threshold were 
deterministic and equal to 

* 17-1 f aAi 

(19) u^n = F^ Ui + E[mo,,(0,l)](l-a) 

where £[7710,^(0, 1)] is given by (17). The threshold (18) can be viewed as the 
largest solution of the equation aG{u) = F^{u), where G{u) is the empirical 
right cumulative distribution function of y^{t),t £ T [Genovese, Lazar and 
Nichols (2002)]. Taking the limit of that equation as L gets large yields the 
solution (19). 

As before, the strong signal assumption in (C2) implies that there exists 
exactly one significant local maximum at each true peak with probabil- 
ity tending to 1 (Lemma 10 in Section 6.1), avoiding error inflation in the 
transition region and justifying the interpretation of definition (12) as the 
expected proportion of falsely discovered peaks. Again, the asymptotic rates 
are exponential and controlled partially by the smallest absolute derivative 
of the smoothed peak shape in the transition region and the curvature of 
the smoothed peak shape at the mode. 

Notice that, in contrast to the asymptotic Bonferroni threshold «Bon (16) 
which grows unbounded with increasing L, the asymptotic BH thresh- 
old Ugjj (19) is finite. 

2.7. Power. Recall from Section 2.4 that a significant local maximum is 
considered a true positive if it falls in the true signal region §1. We define 
the power of Algorithm 1 as the expected fraction of true discovered peaks 



Power (u) = E 
(20) 



1 "^ 

-- > iIT n Sj ^ and max y^ (i) > u 



i=i 

1 ^ 
-^Powerj(u), 



where Powerj(n) is the probability of detecting peak j 

(21) Powerj(n) = p\fnSj / and max y^(i) > u\. 



MULTIPLE TESTING OF LOCAL MAXIMA IN ID 11 

The maximuni operator above indicates that if more than one significant 
local maximum fall within the same peak support, only one is counted, 
so power is not inflated. However, this has no effect asymptotically because 
each true peak is represented by exactly one local maximum of the smoothed 
observed process with probability tending to 1 (Lemma 10 in Section 6.1). 
The next result indicates that both the Bonferroni and BH procedures are 
asymptotically consistent. The proof is given in Section 6.4. 

Theorem 6. Let the power be defined by (20), and let ttBon o-nd ubh ^e 
the Bonferroni and BH thresholds (15) and (18), respectively. Under condi- 
tions (CI) and (C2), 

Power (■UBon) — ^ 1) Power (mbh) — ^ 1- 

For pointwise tests, if there exists a signal anywhere, the BH procedure 
is more powerful than the Bonferroni procedure [Benjamini and Hochberg 
(1995)]. This is also true in our case. Comparing (16) and (19), if J > 1, the 
threshold u'^^^ is higher than the threshold ^^bH' promising a larger power 
for the BH procedure. 

2.8. Optimal smoothing kernel. The best smoothing kernel w.^{t) is that 
which maximizes the power (20) under the true model. Because this maxi- 
mization is analytically difficult, we resort to a less formal argument here. 
Lemma 10 in Section 6.1 shows that, under conditions (CI) and (C2), every 
true peak j is represented by exactly one significant local maximum located 
in a small neighborhood containing the true peak mode Tj with probability 
tending to 1. Thus the power for peak j (21) may be approximated as 

ajhj^^yTj) — ■ 



(22) Powerj(n)RsP{y^(Tj)>u} = $ 

because y-y^Tj) ~ N{ajhj^^{Tj),a'^). By Lemma 13 in Section 6.4, the asymp- 
totically equivalent thresholds (16) and (19) for the Bonferroni and BH pro- 
cedures satisfy u^on/'^j ~^ ^ ^^^ ^bh/^j ~^ ^ f°^ ^^y J- Thus, for large Uj, 
the power (22) is maximized approximately by maximizing the SNR 



(23) SNR. 



-7 






where a is the standard deviation of the observed process y{t). The optimal 
smoothing kernel w^^t) is that which is closest to hj{t) in an L2 sense. This 
result is similar to the matched filter theorem for detecting a single signal 
peak of known shape at a fixed time location t [Pratt (1991), Simon (1995)]. 
The result only holds approximately in our case because the peak locations 
are unknown. 
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Example 7 (Gaussian autocorrelation model). Suppose the signal peak j 
is a truncated Gaussian density hj{t) = (l/6j)(/)[(i — Tj)/bj]l[—Cj,Cj], bj, 
Cj > 0, and let the noise be generated as in Example 3. Ignoring the trun- 
cation, hj^-y{t) = zy-y(t) * hj{t) in (23) is the convolution of two Gaussian 
densities with variances 7^ and 6^, which is another Gaussian density with 
variance 7^ + ^?- Using the moments from Example 3, we have that 

(24) SNR, = ^^^^l^lM = ^ r 1' + -' '"' 

As a function of 7, the SNR is maximized at 



il^ + bp 



(25) argmaxSNR, 



'7 



62-21/2, u<bj/V2, 



7 10, 1/ > bj/\/2. 

In particular, when z/ = 0, we have that the optimal bandwidth for peak j is 
J = bj, the same as the signal bandwidth. We show in the simulations below 
that the optimal 7 is indeed close to (25). 

3. Simulation studies. 

3.1. Nonasymptotic performance. Simulations were used to evaluate the 
performance and limitations of the STEM algorithm for finite range L and 
moderate signal strength a. In a segment of length L = 1000, J = 10 equal 
truncated Gaussian peaks ajhj{t) = a/b(t)[{t — rj)/6]l[— c6, c6], j = 1, . . . , J, 
as in Example 7 with 6 = 3, c = 3 and varying a, were placed at uniformly 
spaced locations Tj = (j — 1/2)L/J, j = 1,..., J, and sampled at integer 
values of t. The noise z{t) was constructed as in Example 3 with a = 1 and 
varying v. Algorithm 1 was carried out using as smoothing kernel a truncated 
Gaussian density w^{t) = {l/^)(j)(t/j)l[—c'y,c'y] as in Example 3 with c = 3 
and varying 7. The noise parameters (8) were estimated independently as 
the empirical moments of smoothed sequences i.i.d. Gaussian noise of length 
1000 and their first and second-order differences, using the same smoothing 
kernel. The Bonferroni and BH procedures were applied at level a = 0.05. 

Figure 3 shows the realized EWER and FDR levels of the Bonferroni 
and BH procedures, evaluated according to (11) and (12) with the expec- 
tations replaced by ensemble averages over 10,000 replications. Error rates 
are maintained below the nominal level a = 0.05 for all bandwidths and 
large enough signal strength a. The convergence is slower, however, when 
the bandwidth 7 is much larger than the signal peak bandwidth 6 = 3. The 
increased error rates are the result of true peak maxima being shifted from 
the original signal region Si into the transition region T-y, where they are 
counted as false positives. This phenomenon disappears with increasing sig- 
nal strength a because the probability of obtaining any local maxima in the 
transition region goes to zero asymptotically (Lemma 10 in Section 6.1). 
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FWER of the Bonferront procedure (top row) and FDR of the BH procedure 
row) for a = 15 (solid), a = 12 (dashed) and a = 9 (dotted). Nominal error level 



As noted in Section 2.4, each true peak may contain more than one local 
maximum of the smoothed data ^^(t). Figure 4 shows that the expected 
number of local maxima per true peak decreases with increasing bandwidth, 
and is essentially equal to 1 for bandwidths equal to or greater than the 
optimal bandwidth. It also gets closer to 1 with increasing signal strength, 
consistent with the result of Lemma 10. 

Figure 5 shows the realized power of the Bonferroni and BH procedures, 
evaluated according to (20) with the expectations replaced by ensemble av- 
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Fig. 4. Average number of local maxima for each true peak for a = 15 (solid), a = 12 
(dashed) and a = 9 (dotted). 
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Fig. 5. Realized (black) and "theoretical" (blue) power of the Bonferrom (top row) and 
BH (bottom row) procedures for o = 15 (solid), a =12 (dashed) and a = 9 (dotted). The 
maxima of the curves (solid circles) approach the asymptotic optimal bandwidth (vertical 



erages over the same 10,000 replications. In all cases, the power increases 
asymptotically to 1 with the signal strength for every fixed bandwidth, and 
is always larger for BH than it is for Bonferroni. The convergence is slower, 
however, when the bandwidth 7 is far from the optimal value. To understand 
the dependence on bandwidth, superimposed is the theoretical approximate 
power (22) evaluated at the asymptotic thresholds n^^^ (16) and Ugjj (19) 
and plugging in the SNR (24). The "theoretical" power curves largely cap- 
ture the shape of the realized ones, but are lower because the asymptotic 
thresholds are more conservative. The curve shape is mostly determined by 
the SNR (24) as a function of 7. The bandwidth 7 producing the largest 
power is always larger than the theoretical optimal bandwidth (25), but it 
approaches it from the right as a increases. 

3.2. Unequal peaks. By assumption (Section 2.1), the signal peaks need 
not be equal. As in Figure 1, J = 5 unequal peaks (Epanechnikov, triangular 
and truncated Gaussian, Laplace and Cauchy, with average half-support 24) 
were corrupted with white standard normal noise. Algorithm 1 was applied 
using a quartic smoothing kernel w.^{t) = 15/(167)[1 — (t/7)^]'^l[— 7,7] with 
varying 7, the noise parameters estimated independently as in Section 3.1. 
For this configuration and 10,000 repetitions, the error was controlled below 
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the nominal level 0.05 for values of 7 up to 40, obtaining a maximum power 
of 0.81 and 0.88 for the Bonferroni and BH procedures at 7 = 18. The max- 
imizing bandwidth represents the average best match between the quartic 
smoothing kernel and the peaks present in the data. 

3.3. Overlapping peaks. The theory of Section 2 assumed that the signal 
peaks had nonoverlapping supports. Simulations similar to those of Sec- 
tion 3.1 with J = 10 partially overlapping peaks showed that the error rates 
were below the nominal level regardless of the amount of overlap between 
peaks. The detection power, however, deceptively increased with increasing 
overlap. This is because definition (20) counts two overlapping peaks as de- 
tected even if only one significant local maximum is found in the overlapping 
region between them, as it belongs to both. Definition (20) does not measure 
the ability to distinguish between overlapping peaks. 

3.4. Comparison with pointwise testing. To see the benefits of testing 
local maxima. Figure 6 compares the performance of the STEM algorithm 
(with Bonferroni and BH corrections) to three other methods that test at 
every single location. Simulated data sets as in Section 3.1 with 6 = 3 and 
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Fig. 6. Left panels: FWER and power of three FWER methods: STEM with Bonferroni 
(black), Bonferroni on all L locations (blue) and Supremum (green). Right panels: FDR 
and power of three FDR methods: STEM with BH (black), BH on all L locations (blue). 
Results in all panels are for a = 15 (solid), a = 12 (dashed) and a = 9 (dotted). Nominal 
error level is 0.05. 



16 A. SCHWARTZMAN, Y. GAVRILOV AND R. J. ADLER 

1/ = were smoothed with varying 7. For the pointwise Bonferroni and BH 
methods, p- values for testing Hq : n{t) = at each t = l,...,L = 1000 were 
computed as p(t) = 1 — ^[y-y{t)/a-y] and then corrected using Bonferroni and 
BH, respectively. The method "Supremum" was adapted from Worsley et al. 
(1996a) as follows. The probability that the supremum of any differentiable 
random process /(t) in the interval [0,T] exceeds u is bounded by [Adler 
and Taylor (2007)] 



(26) P sup f{t)>u\<F[f{0)>u]+E[N^], 

He[0,T] ' 

where N^ is the number of up-crossings by /(t) of the level u in [0,T]. For 
the stationary Gaussian process z^{t)^ application of the Kac-Rice formula 
[Cramer and Leadbetter (1967), page 194] gives that E[A^u] = Li^^Jlv}!^! 
(T.y)(j){u/a'.f). The significance threshold is found as the largest u such that 



(27) Pf sup z,{t)>u)<l-^(-\+L^^l^J-\<a. 

\e[~L/2,L/2] ' \^l) cr^ V'^7/ 

Figure 6 indicates that the pointwise Bonferroni correction is too conser- 
vative. The Supremum method, despite accounting explicitly for the noise 
autocorrelation, performs only slightly better than pointwise Bonferroni, and 
not as well as Bonferroni performed on local maxima. The pointwise BH 
correction is designed to control FDR at the level of individual locations, 
and thus produces too many false positives when the FDR is measured in 
terms of detected peaks using (12). Further simulations with v = \ and v = 1 
yielded similar results (not shown). 

3.5. Automatic bandwidth selection. Rather than using a fixed smooth- 
ing bandwidth 7, the bandwidth may be chosen automatically from the data 
as the one that yields the largest number of discoveries for a fixed error level. 
For simulated data sets as in Section 3.1 with 6 = 3 and ly = 0, the STEM 
algorithm was applied with 7 ranging from 7 = 6/2 = 1.5 to 7 = 26 = 6, and 
results were retained for the bandwidth 7 that yielded the largest number 
of discoveries in each run. Figure 7(a) shows that this automatic criterion 
biases the results toward more detected peaks and therefore results in higher 
error rates (and power) than those obtained when 7 is fixed (Figure 5). It 
also tends to select bandwidths that are smaller than the nominal optimal 
value 7 = 6 [Figure 7(b)], with averages ranging between about 2.1 and 2.9. 

4. Data example. The data consists of recordings from a single elec- 
trode inserted in a salamander's retina, digitized at a sampling frequency 
of 10 kHz. Data of these kind are routinely collected in large amounts in 
neuroscience experiments [Baccus and Meister (2002), Segev et al. (2004)]. 
For the purposes of this paper, three data sets were used: 
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Fig. 7. (a) Power (solid) and realized error rate (dashed) for Bonferrom (black) and 
BH (blue) with automatic bandwidth selection as a function of signal strength a. Nominal 
error level is 0.05. (b) Proportion of automatically chosen smoothing bandwidth 7 over 
1000 simulations for Bonferroni (black) and BH (blue); results are for a = 15 (solid), 
a — 12 (dashed) and a = 9 (dotted). Nominal optimal bandwidth is 7 = 3. 



(1) Test set: 60 seconds of recordings of live cells in the dark. 

(2) Training set 1: 60 seconds of recordings of live cells in the dark. 

(3) Training set 2: 60 seconds of recordings after the retina was allowed 
to die. 

Each period of 60 seconds corresponds to L = 6 x 10^ samples. The goal of 
the analysis was to detect neuronal spikes in the test set (Figure 8, top left). 

Assuming that neuronal action potentials have similar shapes, to maxi- 
mize the SNR (23), the smoothing filter should be close in shape and band- 
width to that of the peaks to be detected. Training set 1 was used to estimate 
the peak shape. In training set 1, spikes with raw maximum exceeding 1 were 
selected and aligned by their maxima [Figure 9(a)]. The peak shape template 
was obtained as the average of the 23 selected major spikes and truncated 
to a length of 100 samples. 

Training set 2, recorded under pure noise conditions, was used to estimate 
the noise parameters. The noise in training set 2 can be well modeled by 
an AR(3) process with autoregressive coefficients —1.13, 0.42 and —0.13, 
estimated by the Yule- Walker algorithm, so that whitening with these coef- 
ficients produces a process whose autocovariance function cannot be distin- 
guished from that of white noise using a Bartlett's test. A similar analysis 
in segments of length L/10 showed that the estimated AR coefficients have 
a coefficient of variation of no more than 1% over the 10 segments, sup- 
porting the stationarity assumption. A Jarque-Bera test of normality for 
the entire sequence returned a p-value of 0.224, supporting the Gaussianity 
assumption. 

Convolving training set 2 with the template of Figure 9(a) produced 
smoothed noise with spectral moments a^ = 4.22 x 10~^, A^ ^ = 1.20 x 10~^ 



18 



A. SCHWARTZMAN, Y. GAVRILOV AND R. J. ADLER 



Full 



Zoom- in 



I 



-a 

T3 
CP 




2 3 4 

time (ms) 



5 6 

xio" 



7000 7200 7400 7600 7800 8000 
time (ms) 



.Pt- 



Jl 



l||lt|lii' Hill iiiilMi|i|il|iiiiii: 

MM 



! ii>iii;:!iiiiliii!iii|iiiiii 'i! iiijiiiiiiiiii 



2 3 4 

time (ms) 



5 6 

xio" 




7000 7200 7400 7600 7800 8000 
time (ms) 



Fig. 8. Top row: the neural spike data (test set); the stars in the right panel indicate peaks 
that are higher than 4 standard deviations of the raw data (dashed line), as suggested by 
Segev et al. (2004)- Bottom row: the data smoothed using the estimated peak shape as 
kernel; the stars indicate significant local maxima higher than the BH threshold (magenta 
dashed line) at level 0.01. The Bonferroni threshold is indicated by the cyan dashed line. 



and A| = 1.96 x 10~^, estimated respectively by the empirical variances of 
the observed process, its first-order difference and its second-order differ- 
ence. Given the length of the process, the standard error of these estimates 
is negligible. 

Algorithm 1 was applied to the test set (Figure 8, top left) by convolving 
it with the template of Figure 9(a), producing the smoothed process in 
Figure 8 (bottom left). In L = 6 x 10^ samples, m = 30,426 local maxima 
were found and their p- values were computed according to (6) and (9), 
plugging in the estimates a'l, \\ and A4^^ found above. The empirical cdf 
of the p-values [Figure 9(b)] shows a large fraction of nonnuh p-values near 
0. For comparison, the same procedure of smoothing, finding local maxima 
and computing their p- values was applied to training set 2. The empirical 
cdf of those p-values is virtually uniform, emphasizing that formula (9) for 
Gaussian noise is appropriate. Also in Figure 9(b), the excess of large p- 
values from the test set is due to the negative portions of the smoothing 
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Fig. 9. (a) 23 strongest spikes aligned by their maximum (black); their average (red) is 
the estimated template, (b) Empirical cdf of p-values for the test set (solid) and training 
set 2 (dashed). 



function [Figure 9(a)]. These produce small negative anti-spikes whose p- 
values are large when tested for positiveness. 

Applying the BH procedure to the m = 30,426 p-values obtained from the 
test set at FDR level 0.01 resulted in a p- value threshold of 2.76 x 10""^ and 
R = 843 significant local maxima. These are indicated in Figure 8 (bottom 
left), showing three levels of spike strengths. Figure 8 (bottom right) zooms 
in to show a few of the weaker spikes. Applying the Bonferroni procedure 
instead in Algorithm 1 resulted in a p-value threshold of 3.29 x 10"^ and 
only 411 detected spikes. 

For comparison, Figure 8 (top right) shows the same segment of the raw 
data and the spikes selected using one of the recommended methods in the 
neuroscience literature, which is to threshold at 4 standard deviations of the 
raw data [Segev et al. (2004)]. Our method is able to identify more spikes at 
a low FDR level of 0.01, but more importantly, it attaches to the findings 
a significance level, expecting about 1% of the detected spikes to be false. 
The conventional method does not offer this useful statistical interpretation. 

As in Section 3.4, computing p-values at each location as p{t) = 1 — 
$(y'y(t)/(7'y), t = l,...,L, and applying a global Bonferroni at level 0.01 
was more conservative, resulting in a height threshold of 1.235 (compara- 
ble to Figure 8 bottom right) and detecting only 393 spikes. Similarly, the 
"Supremum" method, applied by replacing a^ and A^ in (27) at level 0.01, 
yielded a height threshold 1.229 and 394 detected spikes. Finally, applying 
the global BH procedure at level 0.01 with L p-values gave a height thresh- 
old of 0.780 detecting 1149 spikes, but as shown in Section 3.4, this result is 
too optimistic because the actual error rate for peaks is higher than 0.01. 
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5. Discussion. For the theoretical results, the most critical assumptions 
were that the noise process is stationary ergodic Gaussian and that the signal 
peaks are unimodal with compact support. The Gaussianity assumption 
was chosen because it enabled a closed formula for computing the p-values 
associated with the heights of local maxima. For non-Gaussian noise, p- 
values could be computed via Monte Carlo simulation. 

The assumption of compact support for the signal peaks was necessary for 
true and false positives to be well defined. Chumbley et al. (2010) argued for 
testing local maxima when the signal spreads over the entire domain, but in 
that case every positive is a true positive, making the inference unclear. On 
the other hand, agreeing with Chumbley and Friston (2009), applying BH 
globally resulted in inflated error rates for peaks, while applying Bonferroni 
or the Supremum method globally was too conservative. The unimodality 
assumption made local maxima good representatives of true peaks, being 
unique for medium to large bandwidths and asymptotically for increasing 
signal strength. 

The strong signal assumption in condition (C2) was introduced to re- 
move the excess error produced by the smoothed signal spreading into the 
neighboring null regions, thereby enabling asymptotic error control. The 
assumption is not restrictive in the sense that the search space may grow 
exponentially faster. Similar conditions are common for high-dimensional 
data. If the data are pointwise test statistics based on a sample of size n, 
with SNR increasing as a = -^/n — t- cxo, then the condition (logL)/a^ — t- be- 
comes (logL)/n — )• 0. This is similar to the condition {\ogp)/n — )■ required 
for consistent model selection in high-dimensional regression under sparsity 
where p is the number of features [Candes and Tao (2007), Zhang (2010)]. 
Our results, however, do not require sparsity. Condition (C2) is easy to state 
but stronger than needed; upon close inspection of the proof of Lemma 10 
in Section 6.1, the limit of (log L)/a^ need not be zero but need only be 
bounded by a constant that depends on the signal and noise first and sec- 
ond derivatives. 

While the theory was developed for continuous processes, in practice the 
observations are given in a discrete grid. In our simulations we found that 
the results were not reliable when the smoothing bandwidth was smaller 
than the grid spacing, as the theory for continuous random processes is no 
longer a good approximation in that case. 

The asymptotic error control and power consistency did not require the 
peaks to have the same shape or width. The asymptotic results were found 
to hold in practice for a wide range of bandwithds and strong enough signal. 
However, the convergence rate was slower for bandwidths less than half or 
more than double the optimal value. The matched filter principle suggests 
that the smoothing kernel should be chosen to be as close as possible in 
an L^ sense to the peaks to be detected. In the neuronal data analyzed, 
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the peak shape and width were estimated from the data, dictating the best 
smoothing kernel. If the peaks to be detected have different widths, then 
the bandwidth may be adapted to the width of each peak. We leave this 
possibility for future work, as well as the obliged extension of the proposed 
methods to two- and three-dimensional domains. 

6. Technical details. 

6.1. Supporting results. 

Lemma 8. Let rho^-y = i^{t £ Tn §0,7} be be the number of local maxima 
of y-yit) [or z^{t)] in §0,7- Let V^{u) = ^{t £ T Ci So^^:y^{t) > u} be the 
number of local maxima of yy{t) [or z-y{t)] in §9,7 whose heights are above 
the level u. Then 

in probability as L— t-oo, where F^{u) is the Palm distribution (7). 

Proof. Notice that y^{t) = z^{t) for all t G §0,7; so the process y-y(i) 
has the same properties as the stationary process Zry{t) on the set §0,7- By 
ergodicity, the weak law of large numbers applied to the numerator and 
denominator gives that 

V^ ^ #{tGfn§o,^:z^{t)>u}/L 

^0,7 #{tGrnSo,^}/L 

converges to [Cramer and Leadbetter (1967)] 

E[#{t etn Sq,^ : z^jt) > u}] _ E[V^{u)] 

E[#{tGTnSo,7}] ~ E[mo,7] ' 

But also by ergodicity, ratio (28) converges to the conditional probability 
P[z^{t) > u\t G Tn §0,7] = ^'■y{u) by Definition (7). The two limits must be 
equal. D 

Lemma 9. Assume the model of Section 2.1. Let Sj^^ = if^^ U 1™°'^'= U 
/"^ be a partition, where /?i°™ = [cj,dj] C Sj is a fixed interval containing 
the mode of ii^{t) = ajhj^^{t) in Sj as an interior point, such that /ij,7(t) < 
for t e /j"°'i°, /ij-^(i) >0 forte Lf^ and hj^^{t) <0 forte lf^'^\ Let: 

• Mj be the largest value of\hj^^{t)\ in Sj,^; 

• Cj be the smallest value of \hj^^{t)\ in if'^'' = if^ U Lf^^^; 

• Dj be the smallest value of \hj,'y{t)\ in IV^°'^<^, 
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For T given by (5) and any threshold u, 
If 



P(#{tGTn/f'i'=} = 0) 



> 2^(^^\ - 1 - |/fde| J^Vr^( J^ 






(29) 



>$( °J^J ^ _ |j-modc| /^ . 1^ Qj-Pj \ _2$/^-°J'^J 






V \/A^y ■' V ^4,7 V V^A^y V f^7 / ' 

where a^, A2,'y and A4^^ are given by (8) and A6,-y = E['i;^(t)]. 

Proof. (1) Consider first the compact interval l!°^*. The probabihty 
that there are no local maxima of ^^(t) in l!*^^* is greater than the probability 
that y'y{t) > for all t in the interval. This probability is equal to 



P(inf y^(i) >0) > P(inf i7(t) > -inf /i^(t) 

\ rlcft / \ rlcft rlcft 



(30) 



l-P[sup[-i^{t)]>ajCf''], 



rlcft 
J 



where C!''^* > is the smallest value of hj^^{t) in /!°^*. Inequality (26) applies 
above to the stationary Gaussian process —z^{t). The Kac-Rice formula 
[Cramer and Leadbetter (1967), page 194] gives in this case that E[A''u] = 
l-^]'^^*l\/A4^^/Y^A2,7(/)(n/Y^A2,7). Thus (30) has the lower bound 



P(*,*.rnr,^o).*(^)-iri/g<^). 



\/A2,7 / V ^^2,7 V \/A2,7 

A similar calculation for /"^ gives a similar bound with the superscript 

-Yri, 



"left" replaced by "right" and C"^ > being the smallest value of |/ij,7(t)| 



in l"^ . Putting the two together, the required probability P(^{t £ T 
jsidcj-j ^]^g^^ there are no local maxima in /J'^^* nor /"^ is bounded as in the 
first row of (29). 

(2) The probability that y^yit) has no local maxima in /?i°™ is less than the 
probability that y-yicj) < or y-yidj) > 0, for a positive derivative at Cj and 
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a negative one at dj would imply the existence of at least one local maximum 
in Ij. Thus, the probability of no local maxima in /?^°<i'= is bounded above by 

P(#{t G T n If °'i<=} = 0) < P[y^{c,) < 0] + F[y^idj) > 0] 

(31) ^^(z^ikiM) + 1 _^(-^^ki(d,] 



\/^2,7 / V \A 



\2,7 



/ aid 



because yj{t) ~ A^(/i^(t), A2,7) and h^{cj) > Cj >0 and —h^{dj) > Cj > 0. 

On the other hand, the probability that y-yit) has more than one lo- 
cal maxima in /?i°"° is less than the probability that yj{t) > for some t 

in /?i°de_ xhis probability is 



P( sup y^{t) > Oj < P( sup z^{t) > ajDj], 

j j 

where Dj < is the largest value of //^(t) < in /™°dc^ Applying (26) to the 
process z^{t) gives the further upper bound 

model 




p(#{f G^nI™}>l)<l- 
(32) 

Putting (31) and (32) together gives the bound in the second row of (29). 

(3) The probability that no local maxima of y-f{t) in J'^°'^^ exceed the 
threshold u is less than the probability that y-yit) is below u anywhere 
in /™°<i'^^ so it is bounded above by $[(ti — ajMj)/a^]. On the other hand, 

the probability that more than one local maxima of 2/7 (t) in /™°°° exceed u 
is less than the probability that there exist more than one local maximum, 
which is bounded above by (32). Putting the two together gives the bound 
in the third row of (29). D 

Lemma 10. Assume the model of Section 2.1. For T given by (5), let 
"T-1,7 = #{rn§i^^} he the number of local maxima in the set §1,7, and recall 
that W.y{u) = ^{t € Tn§i^-y : 2/7(4) > u} is the number of local maxima in Si^^ 
above threshold u. Under conditions (CI) and (C2).- 

(1) The probability that y^yit) has any local maxima in the transition re- 
gion Ty tends to 0. 

p(#{tGrnT^}>i)^o. 

(2) The probability to get exactly J local maxima in the set Si,-y, 

P(7ni,^ = J) = P(#{t G T n §1,^,} = J) ^ 1. 
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(3) The probability to get exactly J local maxima in the set §i^^ that 
exceed any fixed threshold u, 

V[W^{u) = J]= P[#{t G t n Si,^ : y^{t) >u] = J]^l. 

(4) fhi^^/L — )• Ai in probability. 

(5) W^{u)/rhi^^i — )■ 1 in probability. 



Proof. (1) Write T^ 



U^i^^ 



^, where Tj^^ 



S. 



j^^ \ Sj is the transi- 



tion region for peak j (Figure 2). Under the assumptions of Lemma 9, Tj 



3,1 



is a subset of 7?''^*^ because /!°^* or /"^ may include points inside Sj. Us- 
ing (29), the required probabihty P(#{t G Tn T^} > 1) that y^{t) has any 
local maxima in the transition region T^ is bounded above by 




where a > is the infimum of the Oj's and C > is the infimum of the Cj's, 
that is, the infimum of \hj^-y{t)\ for t G IJ7=i -^f*^^ [recall that every peak hj^^{t) 
has no critical points in the transition region for any j]. But the expression 
above goes to zero under condition (C2) because, for any iT > 0, 



L<j){Ka) 



1 



exp 



logL 



2 







and L[l - ^{Ka)] < L(P{Ka)/{Ka) -^ 0. 

(2) The required probability to obtain exactly J local maxima in the 
set Si^^ = IJi=i ^jn is greater than the probability of obtaining exactly one 
local maximum in each interval JV^°'^'^ c. Sj and none in If'^'^ for any j. Thus, 
using (29), the required probability is bounded below by 



Pi (#{t e t n IJ""^^} = 1 n #{t G T n if"} = o) 



> 1 



Ell 

i=i 



P(#{t G f n /;^°'^"} = 1 n #{t G f n if'^"} 



0)] 



> 1 



E 



5-4$ 



\P^i 



$ 



OjDj 



Vv^ 



4,7 
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>1 



\2,7 V Y^A4^^ / Y A4.^ \yA4^^, 

But this bound goes to 1 under condition (C2) as in part (1). 

(3) The required probabiHty to obtain exactly J local maxima in the set 
^1,7 = Uj=i'S'j,7 that exceed u is greater than the probability that exactly 
one local maximum exceeds u in each interval /?i°°°. This probability is 
bounded below by 
J 



n(#{*' 



side 1 



G T n I™ : y^(t) > n} = 1 n #{t G T n /f^^} = 0) 



j=i 



but this goes to 1 by a similar argument as the one in part (2) of this lemma. 
(4) Since fhi^^/L = {rhi^^/ J){J /L), with J /L — )• ^i, we need to show that 
fhi^^/ J — )• 1 in probability. For any fixed e > 0, 



0<P 



^^1,7 



J 



1 



>e]= P(|?ni,^- J| > Je) < P(mi,T, / J) = 1-P(mi,^ = J) 



since mi^^ and J are integers. But the probability to get exactly J local 
maxima goes to 1 by part (2) of this lemma. 

(5) By part (2) of this lemma, P[W^(n) = J] — t- 1 in probability; therefore, 
using the same arguments as in part (4) of this lemma, we get W^{u)/ J — ?• 1. 
Now, 

W^{u) _ W^{u) J 

But rhi^^/J— >• 1 by part (3) of this lemma. D 
6.2. Strong control of FWER. 



Lemma 11. Let niQ^^ be the number of local maxima in §9,7 o,s in Lem- 
ma 8. Define the thresholds v-Qon = F~ 



F^ -"^(q/E [7710,7]), deterministic. Then |-ubc 
L — )• 00. 



F^ ^(0/7710,7), random, and v^^^ 



^Bonl 



in probability as 



(33) 



Proof. By ergodicity, the weak law of large numbers gives that 

"10,7 



L 



E[mo,7(0,l)] 
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in probability as L — )• oo, where £[7710,7(0,1)] =£[7710,7]/^, given by (17), 
does not depend on L [Cramer and Leadbetter (1967)]. Since log(-) is con- 
tinuous, the continuous mapping theorem gives that 



log 



mo 
L 



log 



E[mo,7 



L 



log 



mo 
a 



log 



'E[mQ, 



a 



0, 



where we have used the additive property of the logarithm 
Define now the monotone increasing function ^lJ^{x 



F-Hi 



The 



function '^^(x) is Lipschitz continuous for all x > 1 because its derivative 
dip-y{x)/dx = e~^ /Fj[tp^{x)] is bounded for all x > 1. Hence, as L — )■ 00, 



V'7 log 



?T70,7 

a 



ijjy log 



E[?fio. 



a 



^Bo 



^Bo 



0. 



D 



Proof of Theorem 4. Let 7710,7 l£ rh ^e the number of local max- 
ima in the set §0,7 as in Lemma 11, and let UBon = -^-7^(0/7710,7) < iiBon- 
Then FWER(uBon) < FWER(i)Bon)- Further, the bound FWER(i)Bon) is the 
probability of obtaining at least one local maximum greater than t^Bon in 
§0 = §0,7 U ^7, which is less than the probability of obtaining at least one 
local maximum greater than uboii in §0.7 or at least one local maximum 
in T7. 

(34) FWER(tiBon) < P[n(^Bon) > 1] + ?(#{* G T H T7} > 1), 

where Vy{u) = #{t € T fl §0,7 '■ U-rit) > u} as in Lemma 8. 

The second probability in (34) goes to zero by Lemma 10, part (1). To 
bound the first probability in (34), write 

P[^7(t'Bon) > 1] = P] T n §0,7 / and maxy7(t) > (uBon - ^Bon) + ^Bon \ ; 

where v^^^ = F~^{a/E[mQ^^]) is deterministic. For any two random vari- 
ables X, Y and any two constants c, e: P{X > Y + c) < F{X > c — e) + 
P(|y| > e). Applying this inequality with X = max^^^ V'yi't), Y = -Sbo 



^Bon 



and c : 



(35) 



-'Bon' 



P[^7(^Bon)>l]<P[V;(t;Ln-^)>l] 



+ P{r n §0,7 / and [-UBon - ^Bonl > ^}- 

The second summand goes to in probability as L — )■ 00 by Lemma 11. 



For the first summand. Lemma 8 with level v- 



Bon 



e gives that 



P[n(4on -£)>!]< E[y7(^Ln - e)] = E[7flo,7]i^7(^6on " e) 



a 



F^{v 



Bon 



^7(4on) 

but the last fraction goes to 1 as L — )• 00. Replacing in (35) and (34) gives 
the result. D 
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6.3. Control of FDR. 

Lemma 12. For any nonnegative integer random variables V, W and 
fixed positive integer J, 

E(,^)<PiW<.,-l)^ "^l"! 



y + wj - ^ - ' E[y] + J' 

Proof. 

^ ' j;=0 u)=0 



oo CO ^ X 

J— 1 oo 

<Y^'^viy = v,w = w) 

ui=Oi'=0 

oo oo / \ 



v=Qw=J 

V 



<P(W^< J-l) + E 
<V{W <J-l) + 



V + J 

E(y) 



E(y) + j' 

The last inequality holds by Jensen's inequality, since V/{V + J) is a concave 
functionof T^ for T^>0 and J> 1. D 

Proof of Theorem 5. Let G{u) = #{t £ f:y^{t) > u}/#{t g f} be 
the empirical marginal right cdf of ^^(t) given t £T. Then the BH thresh- 
old ubh (18) satisfies aG{uBH) = ka/rh = F^(ubh)i so ubh is the largest u 
that solves the equation 

(36) aG{u) = F.i{u). 

The strategy is to solve equation (36) in the limit when L,a ^ oo. We 
first find the limit of G{u). Letting Vy{u) = #{t E T n §0,7 '■ V-yi't) > u} as in 
Lemma 8 and W-y{u) = #{t G T n Si_^ : ^/^(t) > u}, so that R-y{u) = Vy{u) + 
W^{u), write 

(37) c(^) = -^'r^'") = ^^^'") "^"'^ I ^7('») ^1.7 

m ?7iO,7 "T'0,7 + "T-1,7 "^1,7 "^-0,7 + ?7ll,7 

By the weak law of large numbers (33) and Lemma 10, part (3), 

?7tO,7 _ rflQ^^/L E[?77-o,^(0,l)] 

mo,.y + mi,^ rho^^/L + fhi^ry/L E[mo,^(0, 1)] + Ai 
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as L — 7- oo, where the expectation is given by (17). In addition we have the 
results of Lemma 8 and Lemma 10, parts (4) and (5). Replacing these three 
limits in (37), we obtain 

E[mo,7(0,l)] , Ai 



G{u) ^ F^{u) ' "-^ ' \ + 



E[jno,7(0, 1)] + Ai E[77~io,7(0, 1)] + Ai ' 

Now replacing G{u) by its limit in (36), and solving for u gives the deter- 
ministic solution 

(38) ^'^^'''^^^ = A.+nrh^ilim-o^y 

The FDR at the threshold Ugjj is bounded by Lemma 12 by 

FDR(n&H) < nW{ul^) < J - 1) + ^^iJ^f''^^ J 
(39) =V{W{u*^^)<J-l) 

EinKn)] + E[#{t G T n T^ : y^(i) > u^^]] 



+ 



nV^{ul^)]+n#{tefnT^:y^{t)>ul^]] + f 



where we have split V^(^bh) ™^'-' ^^^ reduced null region §0,7 a-nd the tran- 
sition region T^ = Sq \ §0,7- Under condition (C2), Lemma 10, part (1), gives 

(40) < E[#{t G t n T^ : y^{t) > u^^}] < E[#{t G T n T^}] -^ 0. 

By Lemma 8, the remaining terms of the last fraction in (39) can be written 
as 

E[^7Kh)] _ F7Kh)E[^0,7(0,1)]^ 



Wliu*Bu)] + J ^7Kh)E["^0,7(0, l)]L + J 

F^(u|jH)E[mo,7(0,l)] 



F^(n^H)E[mo,7(0,l)] + J/^' 

Since Ugjj solves (38), for L — t- 00 such that J / L — t- ^1, the above expression 
tends to 

,^-^. «E[mo,^(0,l)] E[mo,^(0,l)] ^^ 

^ ' aE[mo,7(0,l)]+^i + (l-a)E[mo,7(0,l)] "E[mo,7(0, 1)] + ^1 ~ "' 

Combining equations (40), (41) and Lemma 10, part (3), in (39), we obtain 
limsupFDR(ugjj) < a. 

Recall that the BH threshold ubh solves equation (36), and Ugjj satis- 
fies (38), where the empirical marginal distribution, G{u), is replaced by its 
limit. Since F^{t) is continuous, F-y{uBK) — > F-y{u^E)^ leading to 
limsupFDR('UBH) ^ct. D 
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6.4. Power. 

Lemma 13. For any j = 1, . . . ,J , let t be any interior point of the sup- 
port Sj of peak j. Under conditions (CI) and (C2), 

UBoJ[ajhj,^{t)] -^ 0, u^u/iajhj^^it)] -^ 

in probability, where u^^^ and Ugjj are given by (16) and (19), respectively. 

Proof. (1) From (9), for u > a^, F^{u) is bounded above and below by 






(42) ^'A(^) < F,iu) < (Ci + l)</'(^) , Ci 

where the lower bound was obtained using $(x) > 1/2 for a; > 1, and the 
upper bound used the fact that y/X^^^/A > l/a^ and 1 — $(x) < (j){x)/x for 
x>\. Let V = F^{u). Inverting the bounds in (42) we obtain 

(43) 2(j2 log ^^^ - logt; ) < n^ < 2a^ ( log ^j^ - logw 



2V27r ) 'V V27r 

Applying these inequalities to v* = F^f(u'^^^ and w = F^[aj7ij^^(t)] gives that 



0^ Kon)' , log[(Cl + l)/V2^]-log(^*) 

" [«i^i,7W]' log[Ci/(2V2^)]-log(w;) 
Applying L'Hopital's rule, the limit of the above fraction when v* and w go 
to zero is the same as the limit of w/v* . But this limit is zero because, by 
the upper bound in (42) and (16), 

EjWjin^^tC \ l) ^^^ E[mo,^(0, 1)] f ajhj^^jt) \ 

^7Kon) ^ ^ > a V ^7 /' 

which goes to zero by the lemma's conditions. 

(2) The FDR threshold Ugjj (19) is bounded, so the result is immediate. 

Proof of Theorem 6. For any threshold u, the detection power 
Power (u) (20) is greater than E[W^(u)]/J > Y>\W^{u) = J]. But this prob- 
ability goes to 1 by Lemma 10, part (3), particularly for the deterministic 
thresholds Ug^^^ and tigjj. It was shown in the proofs of Theorems 4 and 5 
that the gap between the deterministic thresholds and the random thresh- 
olds UBon and ubh narrows to zero asymptotically. Therefore the power for 
these thresholds goes to 1 as well. □ 
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